Analysing Text in Software Projects

نویسندگان

  • Stefan Wagner
  • Daniel Méndez Fernández
چکیده

Most of the data produced in software projects is of textual nature: source code, specifications, or documentations. The advances in quantitative analysis methods drove a lot of data analytics in software engineering. This has overshadowed to some degree the importance of texts and their qualitative analysis. Such analysis has, however, merits for researchers and practitioners as well. In this chapter, we describe the basics of analysing text in software projects. We first describe how to manually analyse and code textual data. Next, we give an overview of mixed methods to automatic text analysis including N-Grams and clone detection as well as more sophisticated natural language processing identifying syntax and contexts of words. Those methods and tools are of critical importance to aid in the challenges in today’s huge amounts of textual data. We illustrate the introduced methods via a running example and conclude by presenting two industrial studies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward Successful ISD in Developing Countries: First Results from a Nigerian Risk Study Using the Delphi Method

The literature of software risk management indicates that the risk of failure in information system development (ISD) can be mitigated by identifying and analysing the threats of success. Moreover, the goals when implementing new technology in Africa, concerning socio-economic development, are more manifold. We have developed a holistic approach for studying successful ISD in Africa, which incl...

متن کامل

Analysing, modelling and improving workflow application development processes

The success of workflow projects to a large extent depends on how workflow application development processes are planned, organized, and conducted. Based on lessons learned from problems encountered during real-world workflow projects, this paper presents a development methodology for workflow application development processes, which guides project managers and developers through the complex st...

متن کامل

Predicting Coding Effort in Projects Containing XML Code

This paper studies the problem of predicting the coding effort for a subsequent year of development by analysing metrics extracted from project repositories, with an emphasis on projects containing XML code. The study considers thirteen open source projects and applies machine learning algorithms to generate models to predict one-year coding effort, measured in terms of lines of code added, mod...

متن کامل

Antecedents of open source software defects: A data mining approach to model formulation, validation and testing

This paper develops tests and validates a model for the antecedents of open source software (OSS) defects, using Data and Text Mining. The public archives of OSS projects are used to access historical data on over 5,000 active and mature OSS projects. Using domain knowledge and exploratory analysis, a wide range of variables is identified from the process, product, resource, and end-user charac...

متن کامل

A COMPARATIVE MODEL OF EVM AND PROJECT’S SCHEDULE RISK ANALYSIS USING MONTE CARLO SIMULATION

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.00164  شماره 

صفحات  -

تاریخ انتشار 2015